Network Attack Signature Generation

ABSTRACT

Described is a technique for detecting attacks on a data communications network having a plurality of addresses for assignment to data processing systems in the network. The technique involves identifying data traffic on the network originating at any assigned address and addressed to any unassigned address. Any data traffic so identified is inspected for data indicative of an attack. On detection of data indicative of an attack, an alert signal is generated.

TECHNICAL FIELD

The present invention generally relates to the generation of attack signatures for the use in detecting network attacks and particularly relates to methods, apparatus, and computer program elements for generating attack signatures on a data network.

BACKGROUND OF THE INVENTION

The Internet is a wide area data network formed from a plurality of interconnected data networks. In operation, the Internet facilitates data communication between a range of remotely situated data processing systems. Such data processing systems each typically comprise a central processing unit (CPU), a memory subsystem, and input/output (I/O) subsystem, and computer program code stored in the memory subsystem for execution by the CPU. Typically, end user data processing systems connected to the Internet are referred to as client data processing systems or simply clients. Similarly, data processing systems hosting web sites and services for access by clients via the Internet are referred to as server data processing systems or simply servers. There is a client-server relationship established via the Internet between the end user data processing systems and the hosting data processing systems.

The Internet has become an important communication network for facilitating electronically effected commercial interactions between clients and servers. Access to the Internet is typically provided to such entities via an Internet Service Provider (ISP). Each ISP typically operates a network to which clients or servers subscribe. Each client is provided with an address on the network. Similarly, each server on the network is provided with an address. The network operated by the ISP is connected to the Internet via dedicated data processing systems usually referred to as routers. In operation, the router directs inbound communication traffic from the Internet to specified addresses, such as IP addresses, cellphone addresses (telephone number) on the network. Similarly, the router directs outbound communication traffic from the network in the direction of specified addresses on the Internet.

A problem faced by many users of data networks is the increasing frequency of electronic attacks to the networks they operate. Such attacks include computer virus attacks, “worm” attacks and denial of service attacks (DOS attacks). Worms and DOS attacks typically introduce significant performance degradation in networks. Infected systems connected to the network typically attempt to spread the infection within the network. Many users do not recognize that their systems are infected. For infected systems, an intrusion detection followed by a subsequent disinfection can be performed in the interest of increasing network performance. To detect an intrusion, an intrusion detection system can make use of so-called attack signatures that have been derived from analyzing known attacks and that characterize those attacks. Some intrusion detection systems utilize a database that contains several attack signatures and compares data traffic against such attack signatures to determine whether the data traffic is likely to pertain to an attack.

US 2002/0143963 A1 describes an apparatus for enhancing the security of a web server from intrusive attacks in the form of HTTP requests. This is accomplished by comparing an incoming request with a predefined list of attack signatures which may comprise at least files, file categories and addresses of known hackers. Action is then taken to reject requests wherein a positive comparison is determined. Further the web server is notified of relevant data provided in connection with a rejected request of potential future action in accordance with the severity of potential damage and frequency of rejected requests from a given requester.

A widely used solution for generating attack signatures is to monitor hacker mailing lists, and to manually craft attack signatures in response to the attacks that one desires to detect. Wenke Lee and Salvatore J. Stolfo in “A Framework for Constructing Features and Models for Intrusion Detection Systems” in ACM Transactions on Information and System Security (TISSEC), 3(4):227-261, 2000 describe a method of learning attack signatures from attack examples. However, this work usually assumes that examples of attacks exist, so one can learn their characteristics. This is generally not the case and it is still desirable to obtain suitable attack examples.

SUMMARY OF THE INVENTION

In accordance with the present invention, there is now provided a method for generating from requests on a first data network attack signatures for use in a second data network, the method comprising a reception step for receiving data traffic from the first data network addressed to a number of unassigned addresses in a third data network; an inspection step for inspecting several incidents of the data traffic that has been received in the previous step, for a common data pattern, and upon finding a the data pattern, a determination step for determining from the corresponding data traffic the attack signature for use in detecting attacks on the second data network. This attack signature generation method makes use of the idea that network traffic directed against an unassigned address is a priori suspicious, and does provide a higher likelihood of being an actual attack. This higher likelihood is exploited to generate one or more attack signatures that are supposed to lead to a more precise detection of attacks.

The term “unassigned” herein is meant as covering an address that is not assigned to a physical device other than an apparatus for detecting an intrusion or generating an attack signature. The apparatus that is designed to execute the method according to the invention will be the device those “unassigned” addresses are actually assigned to in order to make use of the invention. Those addresses are insofar unassigned as they are not assigned to any device that does have another functionality apart from signature generation or intrusion detection. Thereby data traffic that is addressed to such an unassigned address will be received by that apparatus and subjected to the claimed method.

In a preferred embodiment of the invention the method comprises an answer step for spoofing an answer to a source that sent a request contained in the data traffic received. Thereby more information can be obtained from the source of the request.

The answer step above can be executed selectively, using a selection criterion that is dependent on the type of protocol used by the received data traffic. In some protocols the request received already contains enough information to be able to perform the inspection for the common data pattern. In such cases there is no need for sending a spoofed answer to the source.

If the incidents of data traffic are selected only from those of the sources replying to the spoofed answer and those sources that have not been subjected to the answer step, a reduction of the data used for generating an attack signature is achievable. This reduction is useful since it is deemed to concentrate the data on those incidents that have a higher likelihood of being real attacks and not innocent incidents of data traffic that at first sight look like attacks, also referred to as false positives. The selection is a way of reducing the number of false positives in the signature generation method.

In a preferred embodiment of the invention the inspection step comprises sorting the incidents according to a connection attribute that can be one of the source address, the source port, the protocol type and the destination port of the data traffic. This clustering according to connection attribute values is a way of identifying data traffic as an attack, since the more incidents belong to a cluster the more likely it is that the data traffic is an actual attack.

In a preferred embodiment of the invention the determination step comprises counting the number of incidents with a common data substring and defining as an attack signature those data substrings whose number exceeds a predetermined number. Hence, the biggest clusters which represent the most frequently occurring data substrings are used to generate an attack signature. The frequency of data substring is here used as another indicator of likelihood of an attack being a true attack and not a false positive. At the same time, the biggest clusters do represent those attacks which due to their frequency do represent a higher risk to system users.

In a preferred embodiment of the invention the attack signature is sent to an intrusion detection system, also referred to herein as intrusion detector, assigned to the second data network. Such intrusion detector can then integrate the attack signature into its signature library and use it to compare it against data traffic for attack identification and handling.

In a preferred embodiment of the invention the first data network and the second data network are selected to be connected to each other, so that the data traffic that is used to generate the signature attack is at least part of the data traffic that goes to the second data network. In a preferred embodiment of the invention the two networks can be of unitary construction or even identical. Also the third data network can be connected to the second data network, or it can be identical with it. The first data network is connected to the third data network in a way that traffic can be directed from the first data network to addresses on the third data network. Any of the aforementioned data networks can also be a super- or subnetwork of the other.

In a preferred embodiment of the invention the method is combined with an attack identification procedure and also comprises steps of receiving data traffic on the second data network and addressed to an unassigned address; inspecting the data traffic received for data indicative of an attack; and, on detection of data indicative of an attack, generating an alert signal. Thereby, the attack signature generated is used for identifying attacks, wherein those data traffic incidents that are directed at an unassigned address are a priori seen as suspicious and subjected to a match test with the generated attack signature. The procedure of utilizing data traffic directed at unassigned addresses can hence be exploited twice for the ultimate purpose of attack identification.

In a preferred embodiment of the invention, on generation of the alert signal, data traffic originating at the address assigned to the data processing system originating the data indicative of the attack is routed to a disinfection address on the network. Therefor, the source of the attack is marked as a generic attack source and traffic arriving from it is rerouted to the disinfection server. The system originally targeted by that traffic is decoupled from that traffic and thereby protected.

In a preferred embodiment of the invention on generation of the alert signal, an alert signal is sent to the disinfection address, and the alert signal preferably comprises data indicative of the attack detected. The alert signal is of advantage since it can comprise further information for the disinfection server such as the type of attack, an instruction of how to handle this kind of attack. The alert signal could also comprise the computer program code for handling the attack or disinfecting the system that is the source of the attack.

Viewing the present invention from another aspect, there is now provided an apparatus for generating from requests from a first data network attack signatures for use in a second data network having a plurality of addresses assigned to data processing systems. The apparatus comprises a signature generator for receiving data traffic from the first data network addressed to a umber of unassigned addresses in a third data network and arriving at an input interface, inspecting several incidents of the data traffic received for a common data pattern, and upon finding such a data pattern, determining from the corresponding data traffic the attack signature for use in detecting attacks for the second data network.

In a preferred embodiment the apparatus comprises a memory for storing therein the attack signature at least temporarily. The apparatus is preferably designed to spoof replies to sources sending requests contained in the data traffic received. The replies can be sent via a first output interface to the first data network. The first output interface can be preferably combined with the input interface connected to that network.

In a preferred way the apparatus selects the incidents of data traffic only from those of the sources replying to the spoofed answer and those sources that have not been subjected to the spoofed answer. In a preferred embodiment of the invention, the apparatus is designed to sort the incidents according to a connection attribute such as the source address, source port, protocol type or destination port of the data traffic. In an even more preferred embodiment of the invention the attack signature is determinable by the apparatus comprising a counter for counting the number of incidents with a common data substring and defining as an attack signature those data substrings whose number exceeds a predetermined number. The apparatus might preferably have a second output interface for sending the attack signature to an intrusion detection system assigned to the second data network.

In a preferred embodiment of the invention, the apparatus further comprises an intrusion detection sensor, also referred to as intrusion detector, for receiving data traffic from the first data network addressed to an unassigned address of the third data network, inspecting the data traffic received for data indicative of an attack, and, on detection of data indicative of an attack, generating an alert signal.

In a preferred embodiment of the invention, the apparatus further comprises a router connected to the intrusion detector for rerouting data traffic originating at the address assigned to the data processing system originating the data indicative of the attack, to a disinfection address on one of the data networks.

In a preferred embodiment of the invention, the apparatus further comprises a disinfection server assigned to the disinfection address. The disinfection server is designed to send, on receipt of the alert signal, a warning message to the address assigned to the data processing system originating the data indicative of the attack.

The present invention further extends to a computer program element comprising computer program code means which, when loaded in a processor of a data processing system, configures the processor to perform a method for detecting attacks on a data network as herein before described.

The present invention further extends to a method of supporting an entity in the handling of a detected attack by providing instructions for use of, assistance in executing, or execution of disinfection program code. Since a network owner might not have the expertise and capacity to provide for themselves the service of protecting their resources connected to the second communication network from attacks, the method enables such an entity to receive the necessary functionality from outside, preferably remotely via a network connection. The support steps can hence be executed via such a connection and in a preferred embodiment be executed by the attack identification apparatus or the signature generation apparatus that includes such functionality.

The present invention further extends to a method for providing a report to the entity containing information related to one of alert, disinfection, rerouting, logging, discarding of data traffic in the context of a detected attack. Again, in a preferred embodiment such report can be issued and transmitted by the intrusion detector or the signature generation apparatus via a network connection.

The present invention further extends to a method that combines the attack signature generation with billing an entity per attack signature generated. In a preferred embodiment thereof, the charge to be billed is calculated electronically, using as input technical parameters such as the amount of data traffic inspected, the size of the cluster used for the signature generation, the number of addresses monitored or other parameters indicative of the complexity of the attack signature generation.

The present invention further extends to a method that combines the attack signature generation and/or the attack identification with billing an entity for the execution of at least one of the steps in the attack signature generation or handling, the charge being billed preferably being determined in dependence of one of the size of the network, the number of unassigned addresses monitored, the number of assigned addresses monitored, the volume of data traffic inspected, the number of attacks identified, the number of alerts generated, the signature of the identified attack, the volume of rerouted data traffic, the degree of network security achieved, or the turnover of the entity.

It is particularly advantageous to provide the signature generation, forwarding of a signature and the attack identification for several entities and using technical data derived from the execution of the method for one of the entities for the execution of the same method for another of the entities. There is a significant saving in resources expectable, if the signature is generated not only for the use by a specific entity, such as the owner of the second network, but for a multitude of entities, especially if the networks of those entities are connected to the same or a substantially identical portion of the first, second, or third data network. The signature generation method can in a preferred embodiment comprise a selection step that selects the entities according to a selection criterion preferably derived from a degree of similarity in utility of the generated signature. The more similar the infrastructural components of several entities with respect to attackability are, the more likely it is that those entities are prone to the same type of attack and the more similar are the needs to receive the same type of attack signature.

The present invention further extends to a method for deploying a signature generation application to an entity, comprising a step of connecting a signature generator to a third data network for generating from requests from a thereto-connected first data network attack signatures for use in a second data network used by the entity and having a plurality of addresses assigned to data processing systems, a step of setting up the signature generator to generate an attack signature under use of the described signature generation method and a step of setting up the signature generator to send the generated attack signature to an attack identification device connected to the second data network.

There can be a large number of unassigned addresses on a given network. In a particularly preferred embodiment of the present invention, the signature generator listens on the network for traffic directed toward the unassigned addresses. No such traffic should exist, since those addresses are unadvertized or unpublished. In the event that a request sent to one of the unassigned addresses is detected, the signature generator may spoof an answer to the request, unless the request uses a predetermined protocol type that already delivers the information necessary for the inspection. The unassigned addresses are not in use by a device that has another functionality than signature generation or intrusion detection. Thus, an attempt to contact, for example, a server at such an address is a priori suspicious. The signature generator then listens for a reply to the spoofed answer and uses the reply for the signature generation process.

BRIEF DESCRIPTION OF THE FIGURES

Preferred embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of a data processing system;

FIG. 2 is a block diagram of a data processing network;

FIG. 3 is a block diagram of an intrusion detector,

FIG. 4 is a flow diagram associated with the intrusion detector,

FIG. 5 is a block diagram of a signature generator,

FIG. 6 is a flow diagram associated with the signature generator,

FIG. 7 is a block diagram of a signature generator working for an intrusion detector.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring first to FIG. 1, a data processing system comprises a CPU 10, an I/O subsystem 20, and a memory subsystem 40, all interconnected by a bus subsystem 30. The memory subsystem 40 may comprise random access memory (RAM), read only memory (ROM), and one or more data storage devices such as hard disk drives, optical disk drives, and the like. The I/O subsystem 20 may comprise: a display; a printer; a keyboard; a pointing device such as a mouse, tracker ball, or the like; and one or more network connections permitting communication between the data processing system and one or more similar systems and/or peripheral devices via a data network. The combination of such systems and devices interconnected by such a network may itself form a distributed data processing system. Such distributed systems may be themselves interconnected by additional data networks.

In the memory subsystem 40 is stored data 60 and computer program code 50 executable by the CPU 10. The program code 50 includes operating system software 90 and application software 80. The operating system software 90, when executed by the CPU 10, provides a platform on which the application software 80 can be executed.

Referring now to FIG. 2, in a preferred embodiment of the present invention, there is provided a data network 100, also referred to as third data network 100, having a plurality of addresses 110 for assignment to data processing systems in the network. The third data network 100 has a plurality of assignable Internet Protocol (IP) addresses 110. The third data network 100 is connected to the Internet 120, also referred to as first data network 120, via a router 130. The router 130 may be implemented in form of a data processing system as herein before described with reference to FIG. 1 dedicated by appropriate programming to the task to route communication traffic in the form of data packets between the Internet 120 and the third data network 100 based on address data specified in the data packets. A first group 140 of the addresses 110 on the third data network 100 are assigned to systems 150 belonging to users of an Internet service. Each system 150 may be a data processing system as herein before described with reference to FIG. 1. A second group 160 of the addresses 110 on the third data network 100 are free, i.e. unassigned. More specifically, the second group 160 of addresses 110 is not assigned to user systems 150, such as servers or clients. An intrusion detection system (IDS) 170 is also connected to the third data network 100. The IDS 170 is also connected to the router 130. Details of the IDS 170 will be provided further below. The router 130 is connected to a disinfection server 180. The disinfection server 180 may be implemented by a data processing system as herein before described with reference to FIG. 1. Connected to the router 130 and the Internet 120, a signature generator 190 is arranged, that is also connected to the intrusion detector 170.

With reference to FIG. 3, in a particularly preferred embodiment of the present invention, the IDS 170 comprises a data processing system as herein before described with reference to FIG. 1. The application software 80 of the IDS 170 includes intrusion detection code 200. The data 60 stored in the memory subsystem 40 of the IDS 170 includes attack identity data 210 and disinfection data 220. The data 60 also includes a record of which of the addresses on the third data network 100 are unassigned and belong to the second group 160, and which of the addresses 110 on the third data network 100 are assigned to data processing systems 150 and belong to the first group 140. The record is updated each time another address is allocated or an existing address allocation is removed. The attack identity data 210 contains data indicative of signatures identifying known attacks. The disinfection data 220 contains data indicative of: the nature of each attack; how to disinfect a system infected with each attack; and how to resume normal network connectivity. The attack identity data 210 and disinfection data 220 are cross referenced. The intrusion detection code 200, when executed by the CPU 10, configures the IDS 170 to operate in accordance with the flow diagram shown in FIG. 4.

Referring now to an exemplary embodiment depicted in FIG. 4, in operation, the IDS 170 receives data traffic on the third data network 100 originating at an assigned address 140 and addressed to an unassigned address 160. The IDS 170 inspects the data traffic received for data indicative of an attack. On detection of data indicative of attack, the IDS 170 generates an alert signal. In a preferred embodiment of the present invention, on generation of the alert signal, the data traffic originating at the address 140 assigned to the data processing system 150 originating the data indicative of the attack is rerouted to a disinfection address on the third data network 100. In a particularly preferred embodiment of the present invention, the IDS 170 listens on the third data network 100 and receives traffic directed toward the unassigned addresses 160. Specifically, at block 300, the IDS 170 examines requests sent from addresses 140 on the third data network 100 to determine, at block 310, if the request specifies one of the unassigned addresses 160 as the destination address. If the request does not specify one of the unassigned addresses 160, then, at block 320, the IDS 170 waits for the next request to examine. If, however, the request specifies at least one of the unassigned addresses 160, then, at block 330, the IDS 170 spoofs an answer to the request.

The identification may also be realized by assigning the unassigned addresses to the IDS 170, such that any traffic directed at an unassigned address automatically arrives at the IDS 170.

The answer is sent to the source address on the third data network 100. The unassigned addresses 160 are not in use otherwise than being inspected for attacks. Thus, an attempt to contact, for example, a system at such an address is a priori suspicious. At block 340, the IDS 170 listens for a reply to the spoofed answer. The IDS 170 may time out if no reply is received within a predetermined period, in which case, at block 320, the IDS 170 waits for the next request to examine. If a reply is however received, then, at block 350, the IDS 170 compares the suspect request and reply with the attack identity data 210 stored in the memory subsystem 40. If, at block 350, the comparison fails to identify an attack, then, at block 320, the IDS 170 waits for the next request to examine. If, however, the comparison at block 350 detects a diagnosable attack in the reply, then the IDS 170 determines that the source system 150 is infected. Accordingly, at block 360, the IDS 170 generates the alert signal. The alert signal is sent to the router 130. The alert signal instructs the router 130 to divert all traffic from the infected system 150 to the disinfection address. This is a rerouting step for rerouting the data traffic originating at the address assigned to the data processing system originating the data indicative of the attack to a disinfection address on the third data network 100.

Referring back to FIG. 1, in a particularly preferred embodiment of the present invention, a disinfection server 180 is located at the disinfection address. In a preferred embodiment of the present invention, on generation of the alert signal, the IDS 170 sends an alert signal to the disinfection address. Preferably, the alert signal comprises data indicative of the attack detected. Hence, on generation of the alert signal, an alert step is carried out for sending the alert signal to the disinfection address, the alert signal preferably comprising data indicative of the attack detected.

Accordingly, in a particularly preferred embodiment of the present invention, the IDS 170 retrieves the disinfection data 220 corresponding to the attack detected from the memory subsystem 40. At block 370, the IDS 170 sends the alert signal containing retrieved disinfection data to the disinfection address at which the disinfection server 180 resides. Then, at block 320, the IDS 170 waits for the next request to examine. Each request, answer, and reply may be embodied in one or more packets of data traffic on the third data network 100. Accordingly, the signature of each attack may span more than one packet.

In a preferred embodiment of the present invention, the disinfection data 220 sent to the disinfection server 180 contains data indicative of: the nature of the attack detected; how to disinfect the system 150 infected with the attack; and how to resume normal network connectivity. On receipt of the disinfection data 220 from the IDS 170, the disinfection server 180 sets about curing the infected system 150 and restoring the third data network 100. In another preferred embodiment of the present invention, the disinfection data 220 contains only data indicative of the nature of the attack. The disinfection server then selects, based the nature of the attack, one of a plurality of prestored techniques for disinfecting the infected system 150 and/or restoring the third data network 100 and executes the selected technique. The attacks may take many different forms. Accordingly, the corresponding techniques for disinfection and network restoration may vary widely from one attack to the next.

In a preferred embodiment of the present invention, on receipt the disinfection data, the disinfection server 180 sends a warning message to the infected system 150. The warning message informs the user of the infected system 150 that their system 150 is infected. The warning message may instruct the user to run anti-virus software prestored in the infected system 150 to eliminate or otherwise isolate the infection. Alternatively, the warning message may contain disinfection program code for eliminating the attack from the infected system 150, together with instructions to assist the user in executing the disinfection code on the infected system 150. In another alternative, the warning message may direct the user to another web site, at which appropriate disinfection program code is provided. In another preferred embodiment of the present invention, the warning message contains disinfection program code that, when loaded into the infected system, executes automatically, thus eliminating or otherwise isolating the infection in a manner which is transparent to the user. Other disinfection schemes are possible.

In the embodiments of the present invention herein before described, the disinfection server 180 is implemented in a single data processing system such as that herein before described with reference to FIG. 1. However, in other embodiments of the present invention, the disinfection server 180 may be implemented by multiple interconnected data processing systems. Such data processing may be distributed or located together in a “farm”. Each data processing system in the disinfection server may be dedicated to handling a different attack. The IDS 170 may also be implemented by multiple integrated data processing systems. Alternatively, the IDS 170 and the disinfection server 180 may be integrated in a single data processing system.

The traffic on the third data network 100 sent from the infected system 150 and deflected by the router 130 to the disinfection server 180 may be logged and/or discarded by the disinfection server 180. In the embodiments of the present invention herein before described, the IDS 170 sends disinfection data to the disinfection server 220. However, in other embodiments of the present invention, once an infection is detected, the IDS 170 may simply instruct the router 130 to deflect traffic from the infected system 150 to the disinfection server 180 without the IDS 170 additionally supplying disinfection data 220 to the disinfection server 180. The disinfection server 180 may then simply act as a repository for traffic originating in the infected system 150, logging and/or discarding traffic it receives from the infected system 150. The logging and discarding may be reported by the disinfection server 180 to an administrator of the third data network 100. Such reports may be delivered periodically or in real time. The reporting may be performed via, for example, an administration console. However, other reporting techniques, such as printed output for example, are possible. On receipt of such reports, administrators can take actions appropriate for eliminating or otherwise containing the infection of the third data network 100.

Referring to FIG. 5, the signature generator 190 has a structure that is to some extent similar to that of the IDS 170. The signature generator 190 comprises a data processing system as hereinbefore described with reference to FIG. 1. The application software 80 of the signature generator 190 includes signature generation code 230.

The data 60 stored in the memory subsystem 40 of the signature generator 190 includes

-   -   request data, that the IDS has captured, that is data that         arrives with a request     -   reply data, that is data that specifies to which type of request         which type of reply can be spoofed     -   response data, that is data that arrives in response to spoofed         replies.

The data 60 also includes a record of which of the addresses on the third data network 100 are unassigned and belong to the second group 160, and which of the addresses 110 on the third data network 100 are assigned to data processing systems 150 and belong to the first group 140. The record is updated each time another address is allocated or an existing address allocation is removed.

The signature generation code 230, when executed by the CPU 10, configures the signature generator 190 to operate in accordance with the flow diagram shown in FIG. 6.

Referring now to FIG. 6, in operation, the signature generator 190 identifies data traffic originating from the first network 120 addressed to an unassigned address 160 of the third network 100. Specifically, at block 400, the signature generator 190 examines requests received from the Internet 120 to determine, at block 410, if the request specifies one or more of the unassigned addresses 160 as the destination address. If the request does not specify at least one of the unassigned addresses 160, then, at block 420, the signature generator 190 waits for the next request to examine. The request passes through the signature generator 190 to the router 130. The above step sequence is referenced in FIG. 6 as path via 400.1. In other words, a reception and identification step is performed that receives and identifies the data traffic on the first network 120 addressed to a number of unassigned addresses in the third data network 100.

As an alternative to the above described reception and identification step in block 410, the signature generator 190 can skip this step if the signature generator 190 is per default assigned to the unassigned addresses 160. In this case data traffic arriving at the signature generator 190 is by definition directed to an unassigned address 160 and the process jumps from block 400 directly to block 430, referenced in FIG. 6 as path via 400.2. This is a reception step for receiving data traffic from the first data network 120 addressed to a number of unassigned addresses in the third data network 100.

If the request specifies one of the unassigned addresses 160, then, at block 430, the signature generator 190 spoofs a answer to the request. The answer is sent to the source address on the Internet 120. Hence an answer step is executed for spoofing an answer to the source that sent the request contained in the received data traffic. The unassigned addresses 160 are not in use by a device other than the signature generator 190. Thus, an attempt to contact, for example, a system at such an address is a priori suspicious. At block 440, the signature generator 190 listens for a reply to the spoofed answer. The signature generator 190 may time out if no reply is received within a predetermined period, in which case, at block 420, the signature generator 190 waits for the next request to examine. If a reply is however received, then, at block 450, the signature generator 190 performs a signature generation algorithm.

Dependent on the protocol that is used by a request, a spoofing of an answer may not be necessary. In such a case the signature generator 190 may skip the step of spoofing an answer in block 430 and the step of listening to a reply in block 440. For those requests, the original response is used as a substitute for the reply to the spoofed answer. The answer step is hence in that case executed selectively based on a selection criterion that is dependent on the type of protocol of the received data traffic, The selection criterion may also include additional criteria besides the type of protocol, e.g. the criterion of repetition, i.e. whether the received data traffic has been identified as identical with previously received data traffic.

The signature generation algorithm comprises an inspection step in which several incidents of data traffic that are addressed to a number of unassigned addresses in the third data network are inspected for ocurrence of a common data pattern.

In a preferred embodiment thereof, the incidents of data traffic are selected only from those of the sources replying to the spoofed answer and sources that have not been subjected to the answer step. With other words, the requests from the non-replying sources are not used for the signature generation. This reduces the data that is used for the inspection step.

The inspection step may comprise sorting the incidents according to a connection attribute such as one of source address, source port, protocol type and destination port of the data traffic.

Thereafter follows a determination step for, upon finding such a common data pattern, determining from the corresponding data traffic an attack signature for use in detecting attacks for the second data network. The determination step, in a preferred embodiment, comprises counting the number of incidents of the data traffic with a common data substring and defining as the attack signature those data substrings whose number exceeds a predetermined number. Hence substring clusters are built with the cluster size being the number of incidents that contain the corresponding data substring. As soon as the cluster size exceeds the predetermined number, that cluster's substring is defined as attack signature.

In the following paragraphs a preferred embodiment of such a signature generation algorithm is explained.

The received reply is defined as input wherein a set S={I<src_(i), dst_(i), dpt_(i), request_(i)>_(iε1)} of tuples, and wherein src_(i) is the source address, dst_(i) is the destination address, dpt_(i) is the destination port, that is typically associated with a particular service, and request is the request that the source sent to one of the unassigned addresses, but that eventually arrives at the signature generator 190. The destination port is also referred to as service type dpt_(i). A method is performed in the signature generator 190 that has the following two functionalities:

1. Find per-service signatures:

1.1 Group the tuples in S by destination port, and associate each destination port p with the set R(p)={request j | <*, *, p, request j> in S} of requests that were issued against this port p.

1.2 For each port p, find all frequent substrings in R(p) that have a predetermined minimum length. These frequent substrings are the per-service signatures for the service that runs on port p.

The Find-per service signatures algorithm monitors the requests that occur and are directed to a specific port and analyzes them for frequent common substrings of the predetermined minimum length. The substrings are the piece of the request that characterize it as an attack. Requests with common substrings are assumed to represent the same type of attack. The port-specific handling makes use of the fact that typically attacks that are directed at different ports are too different to be captured by the same signature. A common substring occurring in requests directed at different ports is therefore likely to be coincidental and hence with a higher likelihood not part of an attack. The port-specific handling hence leads to signatures that reduce the likelihood of false positives, i.e. the erroneous identification and handling of a request as an attack, the request factually being innocent.

The above algorithm for finding per-service signatures can in a preferred embodiment be complemented by a second algorithm:

2. Find attack-tool signatures:

2.1 Group the tuples in S by source, and associate each source s with the set R′(s)=(request j | <s, *, *, request j> in S} of requests it made. 2.2 For each R′(s), replace all requests in R′(s) by the per-service signatures they match (see 1). The resulting set R*(s) of per-service signatures is the attack-tool signature.

The Find attack-tool signatures part of the algorithm uses a grouping per source and analyzes requests coming from that source, regardless of the port they are directed to. This method makes use of the fact that some attacks show a changing pattern of ports to outsmart intrusion detectors. In this case the signature of an attack would provide an identification of an attack by its source together with a certain substring.

In a preferred embodiment this algorithm might be modified in that dynamic weighting of the sources according to the number of different contacted destination addresses is applied.

For instance, the predetermined number of data substrings that is to be exceeded to define those data substrings as an attack signature can be adapted. That number may be selected to be lower for sources that direct data traffic to more than one destination address. In particular, that number may be selected reciprocally dependent on the number of destination addresses. This takes into account the fact that the more addresses a data traffic is directed to, the more suspicious the source is and the more likely it is that the data traffic is an actual attack.

Also, dynamic weighting of the accuracy of the signatures according to related known properties of a source is possible. In particular, if a given source is known to be an attacker, based upon other previously known signatures, then it is probable that a consistent but unknown activity from that same source is also an attack.

Furthermore, timing information can be included. Attacks are often performed by packaged hacker tools. Such tools do not generally randomize the attack sequence so that time sequence analysis can determine not only the attack but also the tool running the attack.

The output of the signature generation algorithm is an attack signature that can be used by the intrusion detector 170 to recognize requests as attacks. Hence, using the received responses to spoofed replies to requests directed at unassigned addresses 160, the signature generator 190 derives patterns that are translated into attack signatures. Those attack signatures are in block 460 forwarded to the intrusion detector 170 where the attack signatures are integrated into the attack identification data 210. Hence, by sending the attack signature to the intrusion detector 170 that is assigned to the second data network the signature generator 190 updates the attack identification data 210 that is used by the IDS 170. The described method is not only useful for improving attack identification in the third data network 100, but can be more generally applied for a second network. This means that although the attack signature has been generated by listening to requests coming from the first data network and being directed to unassigned addresses in the third data network, those same attack signatures are sendable to a second data network for use in an intrusion detection method therein. Whereas all three networks can be distinct, the described method works with any combination thereof, such as any of the networks being one of seamlessly connected to, integrated in, connected to, a part of, a subnet of, a supernet of, partially identical with, of unitary construction with, partially connected to partially integrated in one or more of the other networks. In the embodiment depicted in FIG. 2, the second network and the third network 100 are identical.

The generation of the signature in block 450 is in a preferred way executed based on a multitude of received requests and/or replies to spoofed answers. For requests that do not need a sequence of spoofing an answer and listening to a reply, the receipt of those requests in block 400 is performed for a multitude of requests, referenced in FIG. 6 as loop 401. For requests that are subjected to a sequence of spoofing an answer (block 430) and listening to a reply (block 440), the sequence of blocks 400, 430, 440 is performed for a multitude of requests, referenced in FIG. 6 as loop 441.

Another preferred embodiment comprises an execution of the signature generation algorithm in block 450 and is followed by a reexecution of that algorithm using additional input of examined requests and/or replies. Thereby the outcome of the signature generation algorithm is updated over time.

In a preferred deployment of the signature generator, it is placed on a direct link to the Internet. In that way, it is subjected to a maximum amount of hostile attack activity and the above algorithm can derive a larger number of attack signatures. Once derived, the attack signatures can be deployed to an IDS. Such an IDS can also reside in a different network configuration, e.g. behind a firewall.

In a preferred embodiment of the present invention, there is furthermore provided a data network comprising: a router for connecting a plurality of data processing systems to the Internet; an intrusion detection system (IDS) connected to the router and connected to the signature generator; and a disinfection server also connected to the router. In response to the IDS detecting, using attack signatures that have been generated by the signature generator, that one of the data processing systems is infected by an attack, the IDS instructs the router to deflect all network traffic from that attack to the disinfection server. The IDS in addition supplies disinfection data to the disinfection server. The disinfection data is indicative of the nature of the infection, how to disinfect the infecting system, and how to resume normal network connectivity.

In FIG. 7, an arrangement of three networks is shown. The first data network 120 is connected to the third data network 100 and to the second data network 70 via the router 130. The third data network comprises the assigned addresses 140 and allocated to the unassigned addresses the signature generator 190 which is connected to the intrusion detector 170. The router 130 is connected to the intrusion detector 170 and the disinfection server 180. The intrusion detector 170 is allocated to the unassigned addresses of the second data network 70 which also has assigned addresses 71. The connection of any of the aforementioned can be realized directly or indirectly, e.g. via any of the network connections available.

In this embodiment, the data traffic directed towards the unassigned addresses 160 in the third data network arrives automatically at the signature generator 190 which, using its signature generation algorithm, generates an attack signature therefrom. The data traffic arriving at the signature generator 190 can originate in the first network 120 but also come from within the third data network 100.

The generated attack signature is sent by the signature generator 190 to the IDS 170 which can make use of the attack signature to identify incidents of arriving data traffic as attacks. Again, since the signature generator 170 is allocated to the unassigned addresses in the second network 70, data traffic that is directed towards the unassigned addresses in the second network 70 is automatically subjected to an attack identification process by the IDS 170 using the attack signature received. Also here, the data traffic arriving at the IDS 170 can originate in the first network 120 but also come from within the second data network 100.

Once an attack has been identified as such, the intrusion detector 170 alerts the router 130 of the existence of an attack. The router 130 can then redirect the data traffic that arrives from the identified source of the attacks to the disinfection server 180. If the attacking source resides within the third data network 170, it can be subjected to a disinfection process.

It is furthermore possible to deploy a signature generation application to an entity, comprising the following steps: connecting the signature generator to the first data network for generating from requests thereon attack signatures for use in the second data network used by said entity and having a plurality of addresses assigned to data processing systems, setting up the signature generator to generate an attack signature under use of the aforedescribed method, and setting up the signature generator to send the generated attack signature to an intrusion detector connected to the second data network.

A signature generator is in a preferred embodiment a “server with no services”. More precisely, it is characterized by two properties: First, it is a security-hardened host that, while offering no real services, listens on all security-relevant ports and logs all incoming requests. Second, it is not advertised in any way, i.e. there are no DNS entries, web links, or other pointers to it. Because it is not advertised, a machine that contacts a signature generator is with near certainty a hacker or worm looking for a target to exploit. As the signature generator logs all incoming requests, it gets hold of the attacks that the hacker or worm is using. Hence, the signature generator is an advantageous source of actual attacks, from which then automatically attack signatures are derived.

The functionality described for the signature generator 190 can in a preferred embodiment also be integrated into the IDS 170, such that the IDS 170 derives from the replies to its spoofed answers attack signatures that are added to its attack identification data 210. In the embodiments of the present invention herein before described, the IDS 170, router 130, and disinfection server 180 are implemented by data processing systems programmed with appropriate program code. However, it will be appreciated that, in other embodiments of the present invention, one or more of the functions described herein as being implemented in software may be implemented at least partially in hardwired logic circuitry.

It will also be appreciated that the attack detection methods described herein may be implemented by the service provider responsible for the third data network 100, or at least partially by a third party in the form of a service to the service provider. Such a service may differentiate the service offered by the service provider from the services provided by it competitors. Such differentiated services may be optionally supplied to end users of the network service provided in exchange for an additional premium.

The method described can be completed by providing a report to an entity, wherein the report contains information related to one of alert, disinfection, rerouting, logging, discarding of data traffic in the context of a detected attack.

The service of generating attack signatures for networks used by an entity other than the service provider, may in a preferred embodiment comprise billing for the service delivered. The charge to be billed may therein be determined in dependence of one or more of a number of factors that typically are indicative of the complexity or workload experienced by the service provider. Such factors indicative of volume and time-consumption of the service provided may include the size of the third data network, the number of unassigned addresses therein, the number of assigned addresses therein, the volume of data traffic inspected, the number of attacks identified, the number of alerts generated using the attack signature, the volume of rerouted data traffic. Factors identifying a level of increased complexity can be the signature of the identified attack, the degree of network security achieved. Also factors identifying the value of the service provided to the serviced entity may be used such as the turnover of the entity, the field of business of the entity, or the like. In general the method hence may comprise billing the entity for the execution of at least one of the steps performed for the entity, the charge being billed preferably being determined in dependence of one of the size of the network, the number of unassigned addresses monitored, the number of assigned addresses monitored, the volume of data traffic inspected, the number of attacks identified, the number of alerts generated, the signature of the identified attack, the volume of rerouted data traffic, the degree of network security achieved, the turnover of the entity.

Of course, any combination of the previously mentioned factors is possible, in particular being differently weighed to determine a final charge. The billing can be automated in that the charge is sent together with the warning message or any other information sent in the attack detection process, This advantageously combines the use of the messaging for the attack-handling purpose together with its use for the billing purpose. The double use of the warning message or another information message provides the technical advantage of reducing the traffic flow generated through the attack detection and billing process. At the same time this method can be used to guarantee that the serviced entity is only billed for exactly the service provided.

Another preferred solution for billing is offering the entity a subscription to the attack signature generation and/or attack detection service that allows the serviced entity to profit from the process for a predetermined time, volume of traffic, number of systems or the like.

The service provider may offer his own disinfection server as a hosting unit to be used in combination with the network used by the serviced entity, but it is also possible that the disinfection server is held, maintained, hosted or leased by the serviced entity.

The method can furthermore be used for supporting an entity in the handling of the detected attack by one of providing instructions for use of, assistance in executing, and execution of disinfection program code.

In a further preferred embodiment the service provider may utilize a synergistic effect by providing the attack signature generation and/or attack detection service to several entities, and sharing the resources, such as the signature generator 190, the router 130, the intrusion detector 170 or the disinfection server 180 among the several services. Thereby not only more efficient use of the employed resources can be obtained but also attack-related information between the different networks can be shared and could be utilized to improve the detection quality on the serviced networks. For instance the detection of an attack on one network could lead to a quicker detection on another network since the process of determining an attack signature can be shortened or even eliminated. Also the disinfection mechanism can be shared between the serviced entities thereby reducing their effort and costs related to updating and maintaining the disinfection mechanism. The technical advantage of sharing technical data that is derived from the handling of attacks to the network of one entity to improve the attack handling of another serviced entity will provide an incentive for entities to join a pool of several entities being serviced by the same service provider for intrusion detection. The billing model could in a preferred embodiment be adapted to motivate the participation of entities in a group of entities sharing the signature generation or attack detection resources and employing the same service provider. Hence there is an advantage in providing the method for several entities and using technical data derived from the attack-handling for one of the entities for the attack-handling for another of the entities.

The described method can be coded in form of a computer program element comprising computer program code means which, when loaded in a processor of a data processing system, configures the processor to perform a method for generating attack signatures.

Furthermore the present invention can be realized in hardware, software, or a combination of hardware and software. The method according to the present invention can be realized in a centralized fashion in one computer system, or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein. The present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods.

A computer program or computer program means in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a device having an information processing capability to perform a particular function either directly or after either or both of the following a) conversion to another language, code or notation; b) reproduction in a different material form. 

1. A method for generating attack signatures from requests from a first data network, said attack signatures being usable in a second data network, the method comprising: a reception step for receiving data traffic from the first data network addressed to a number of unassigned addresses in a third data network; an inspection step for inspecting several incidents of said received data traffic for a common data pattern, a determination step for, upon finding a said common data pattern, determining from the corresponding data traffic said attack signature for use in detecting attacks for said second data network, wherein the determination step comprises counting the number of incidents with a common data substring that has a predetermined minimum length and defining as a said attack signature those data substrings whose number exceeds a predetermined number.
 2. A method as claimed in claim 1, wherein the inspection step is preceded by an answer step for spoofing an answer to a said source that sent a request contained in the received data traffic.
 3. A method as claimed in claim 2, wherein said answer step is executed selectively based on a selection criterion that is dependent on the type of protocol of said received data traffic.
 4. A method as claimed in claim 2, wherein for the inspection step the incidents of data traffic are selected only from those of the sources replying to the spoofed answer and sources that have not been subjected to the answer step.
 5. A method as claimed in one of claim 1, wherein the inspection step comprises sorting the incidents according to a connection attribute such as one of source address, source port, protocol type and destination port of the data traffic.
 6. A method as claimed in one of claim 1, further comprising sending said attack signature to an intrusion detector assigned to said second data network.
 7. A method as claimed in one of claim 1, wherein two or all of the first data network, the second data network, and the third data network are selected to be identical, partially identical, connected to each other, or of unitary construction.
 8. A method as claimed in one of claim 1, further comprising a reception step for receiving on the second data network data traffic addressed to an unassigned address therein; inspecting the received data traffic for data indicative of an attack, using therefor a said attack signature; and, on detection of data indicative of an attack, generating an alert signal.
 9. A method as claimed in claim 8, comprising, on generation of the alert signal, a rerouting step for rerouting the data traffic originating at the address assigned to the data processing system originating the data indicative of the attack to a disinfection address.
 10. A method as claimed in claim 8, comprising, on generation of the alert signal, an alert step for sending an alert signal to the disinfection address, said alert signal preferably comprising data indicative of the attack detected.
 11. A method as claimed claim 8, further comprising supporting an entity in the handling of the detected attack by one of providing instructions for use of, assistance in executing, and execution of disinfection program code.
 12. A method as claimed in claim 8, further comprising providing a report to said entity containing information related to one of alert, disinfection, rerouting, logging, discarding of data traffic in the context of a detected attack.
 13. A method as claimed in claim 1, further comprising billing said entity for the execution of at least one of the steps, the charge being billed preferably being determined in dependence of one of the size of the network, the number of unassigned addresses monitored, the number of assigned addresses monitored, the volume of data traffic inspected, the number of attacks identified, the number of alerts generated, the signature of the identified attack, the volume of rerouted data traffic, the degree of network security achieved, the turnover of said entity.
 14. A method as claimed in claim 1, further comprising providing said method for several entities and using technical data derived from the attack-handling for one of said entities for the attack-handling for another of said entities.
 15. A method for deploying a signature generation application to an entity, comprising connecting a signature generator to a first data network for generating from requests thereon attack signatures for use in a second data network used by said entity and having a plurality of addresses assigned to data processing systems, setting up said signature generator to generate an attack signature under use of the method claimed in claim 1, setting up said signature generator to send the generated attack signature to an intrusion detector connected to said second data network.
 16. A computer program element comprising computer program code means which, when loaded in a processor of a data processing system, configures the processor to perform a method for generating attack signatures as claimed in claim
 1. 17. An apparatus for generating attack signatures from requests on a first data network, said attack signatures being usable in a second data network, the apparatus comprising: a signature generator for receiving data traffic on the first data network addressed to a number of unassigned address in a third data network, inspecting several incidents of said data traffic received for a common data pattern, and upon finding a said data pattern, determining from the corresponding data traffic said attack signature for use in detecting attacks for said second data network, wherein the signature generator is designed to determine the attack signature by counting the number of incidents with a common data substring and defining as a said attack signature those data substrings whose number exceeds a predetermined number.
 18. An apparatus as claimed in claim 17, wherein the signature generator is designed to inspect the received data traffic by spoofing an answer to a said sources that sent a said request contained in the data traffic received.
 19. An apparatus as claimed in claim 18, wherein the signature generator is designed to select the incidents of data traffic only from those of the sources replying to the spoofed answer and those sources that have not been subjected to a said spoofed answer.
 20. An apparatus as claimed in claim 17, wherein the signature generator is designed to sort the incidents according to a connection attribute such as one of source address, source port, protocol type, and destination port of the data traffic.
 21. An apparatus as claimed in claim 17, wherein the signature generator is designed to send said attack signature to an intrusion detector assigned to said second data network.
 22. An apparatus as claimed in claim 17, further comprising an intrusion detector for receiving data traffic on the first data network addressed to a number of unassigned address in said second data network, inspecting the received data traffic for data indicative of an attack, using a said attack signature, and, on detection of data indicative of an attack, generating an alert signal.
 23. An apparatus as claimed in claim 17, further comprising a router connected to the intrusion detector for rerouting data traffic originating at the address assigned to the data processing system originating the data indicative of the attack to a disinfection address on said third data network.
 24. An apparatus as claimed in claim 23, further comprising a disinfection server assigned to the disinfection address, the disinfection server being equipped for sending, on receipt of the alert signal, a warning message to the address assigned to the data processing system originating the data indicative of the attack.
 25. A method as claimed in claim 1, wherein the inspection step is preceded by an answer step for spoofing an answer to a said source that sent a request contained in the received data traffic; wherein said answer step is executed selectively based on a selection criterion that is dependent on the type of protocol of said received data traffic; wherein for the inspection step the incidents of data traffic are selected only from those of the sources replying to the spoofed answer and sources that have not been subjected to the answer step; wherein the inspection step comprises sorting the incidents according to a connection attribute such as one of source address, source port, protocol type and destination port of the data traffic; further comprising sending said attack signature to an intrusion detector assigned to said second data network; wherein two or all of the first data network, the second data network, and the third data network are selected to be identical, partially identical, connected to each other, or of unitary construction; further comprising a reception step for receiving on the second data network data traffic addressed to an unassigned address therein; inspecting the received data traffic for data indicative of an attack, using therefor a said attack signature; and, on detection of data indicative of an attack, generating an alert signal; further comprising, on generation of the alert signal, a rerouting step for rerouting the data traffic originating at the address assigned to the data processing system originating the data indicative of the attack to a disinfection address; further comprising, on generation of the alert signal, an alert step for sending an alert signal to the disinfection address, said alert signal preferably comprising data indicative of the attack detected; further comprising supporting an entity in the handling of the detected attack by one of providing instructions for use of, assistance in executing, and execution of disinfection program code; further comprising providing a report to said entity containing information related to one of alert, disinfection, rerouting, logging, discarding of data traffic in the context of a detected attack; further comprising billing said entity for the execution of at least one of the steps, the charge being billed preferably being determined in dependence of one of the size of the network, the number of unassigned addresses monitored, the number of assigned addresses monitored, the volume of data traffic inspected, the number of attacks identified, the number of alerts generated, the signature of the identified attack, the volume of rerouted data traffic, the degree of network security achieved, the turnover of said entity; and further comprising providing said method for several entities and using technical data derived from the attack-handling for one of said entities for the attack-handling for another of said entities.
 26. An article of manufacture comprising a computer usable medium having computer readable program code means embodied therein for causing generation of attack signatures from requests from a first data network, said attack signatures being usable in a second data network, the computer readable program code means in said article of manufacture comprising computer readable program code means for causing a computer to effect: a reception step for receiving data traffic from the first data network addressed to a number of unassigned addresses in a third data network; an inspection step for inspecting several incidents of said received data traffic for a common data pattern, a determination step for, upon finding a said common data pattern, determining from the corresponding data traffic said attack signature for use in detecting attacks for said second data network, wherein the determination step comprises counting the number of incidents with a common data substring that has a predetermined minimum length and defining as a said attack signature those data substrings whose number exceeds a predetermined number.
 27. A computer program product comprising a computer usable medium having computer readable program code means embodied therein for causing generation of attack signatures from requests from a first data network, the computer readable program code means in said computer program product comprising computer readable program code means for causing a computer to effect the functions of claim
 17. 